Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
J Comput Graph Stat ; 32(3): 1109-1118, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37982131

RESUMO

Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model. Supplemental materials, including computer code to reproduce our results, are available online.

2.
Bayesian Anal ; 17(1): 165-192, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36213769

RESUMO

Approximate Bayesian computation (ABC) is a simulation-based likelihood-free method applicable to both model selection and parameter estimation. ABC parameter estimation requires the ability to forward simulate datasets from a candidate model, but because the sizes of the observed and simulated datasets usually need to match, this can be computationally expensive. Additionally, since ABC inference is based on comparisons of summary statistics computed on the observed and simulated data, using computationally expensive summary statistics can lead to further losses in efficiency. ABC has recently been applied to the family of mechanistic network models, an area that has traditionally lacked tools for inference and model choice. Mechanistic models of network growth repeatedly add nodes to a network until it reaches the size of the observed network, which may be of the order of millions of nodes. With ABC, this process can quickly become computationally prohibitive due to the resource intensive nature of network simulations and evaluation of summary statistics. We propose two methodological developments to enable the use of ABC for inference in models for large growing networks. First, to save time needed for forward simulating model realizations, we propose a procedure to extrapolate (via both least squares and Gaussian processes) summary statistics from small to large networks. Second, to reduce computation time for evaluating summary statistics, we use sample-based rather than census-based summary statistics. We show that the ABC posterior obtained through this approach, which adds two additional layers of approximation to the standard ABC, is similar to a classic ABC posterior. Although we deal with growing network models, both extrapolated summaries and sampled summaries are expected to be relevant in other ABC settings where the data are generated incrementally.

3.
Sci Rep ; 12(1): 6985, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35484268

RESUMO

During the COVID-19 pandemic, many countries implemented international travel restrictions that aimed to contain viral spread while still allowing necessary cross-border travel for social and economic reasons. The relative effectiveness of these approaches for controlling the pandemic has gone largely unstudied. Here we developed a flexible network meta-population model to compare the effectiveness of international travel policies, with a focus on evaluating the benefit of policy coordination. Because country-level epidemiological parameters are unknown, they need to be estimated from data; we accomplished this using approximate Bayesian computation, given the nature of our complex stochastic disease transmission model. Based on simulation and theoretical insights we find that, under our proposed policy, international airline travel may resume up to 58% of the pre-pandemic level with pandemic control comparable to that of a complete shutdown of all airline travel. Our results demonstrate that global coordination is necessary to allow for maximum travel with minimum effect on viral spread.


Assuntos
COVID-19 , Influenza Humana , Teorema de Bayes , COVID-19/epidemiologia , COVID-19/prevenção & controle , Humanos , Influenza Humana/epidemiologia , Pandemias/prevenção & controle , Viagem
4.
Mol Ecol Resour ; 21(8): 2598-2613, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33950563

RESUMO

Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.


Assuntos
Algoritmos , Genética Populacional , Teorema de Bayes , Simulação por Computador , Demografia , Polimorfismo de Nucleotídeo Único , Aprendizado de Máquina Supervisionado
5.
medRxiv ; 2021 Apr 20.
Artigo em Inglês | MEDLINE | ID: mdl-33907768

RESUMO

During the COVID-19 pandemic, many countries implemented international travel restrictions that aimed to contain viral spread while still allowing necessary cross-border travel for social and economic reasons. The relative effectiveness of these approaches for controlling the pandemic has gone largely unstudied. Here we developed a flexible network meta-population model to compare the effectiveness of international travel policies, with a focus on evaluating the benefit of policy coordination. Because country-level epidemiological parameters are unknown, they need to be estimated from data; we accomplished this using approximate Bayesian computation, given the nature of our complex stochastic disease transmission model. Based on simulation and theoretical insights we find that, under our proposed policy, international airline travel may resume up to 58% of the pre-pandemic level with pandemic control comparable to that of a complete shutdown of all airline travel. Our results demonstrate that global coordination is necessary to allow for maximum travel with minimum effect on viral spread.

6.
Preprint em Inglês | medRxiv | ID: ppmedrxiv-21255465

RESUMO

During the COVID-19 pandemic, many countries implemented international travel restrictions that aimed to contain viral spread while still allowing necessary cross-border travel for social and economic reasons. The relative effectiveness of these approaches for controlling the pandemic has gone largely unstudied. Here we developed a flexible network meta-population model to compare the effectiveness of international travel policies, with a focus on evaluating the benefit of policy coordination. Because country-level epidemiological parameters are unknown, they need to be estimated from data; we accomplished this using approximate Bayesian computation, given the nature of our complex stochastic disease transmission model. Based on simulation and theoretical insights we find that, under our proposed policy, international airline travel may resume up to 58% of the pre-pandemic level with pandemic control comparable to that of a complete shutdown of all airline travel. Our results demonstrate that global coordination is necessary to allow for maximum travel with minimum effect on viral spread.

7.
Mol Ecol ; 29(23): 4542-4558, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33000872

RESUMO

Dating population divergence within species from molecular data and relating such dating to climatic and biogeographic changes is not trivial. Yet it can help formulating evolutionary hypotheses regarding local adaptation and future responses to changing environments. Key issues include statistical selection of a demographic and historical scenario among a set of possible scenarios, and estimation of the parameter(s) of interest under the chosen scenario. Such inferences greatly benefit from (a) independent information on evolutionary rate and pattern at genetic markers; and (b) new statistical approaches, such as approximate Bayesian computation-random forest (ABC-RF), which provides reliable inference at a low computational cost and the possibility to measure prediction quality at the exact position of the observed data set. Here, we show full potential of the ABC-RF approach including prior knowledge on microsatellite genetic markers to decipher the evolutionary history of the African arid-adapted pest locust, Schistocerca gregaria, with support for a southern colonization of Africa, from a low number of founders of northern origin, dating back 2.6 Ky (90% CI: 0.9-6.6 Ky). We verify that this divergence time estimate accurately reflected true divergence time values by computing accuracy at a local posterior scale from simulated pseudo-observed data sets. The inferred divergence history is better explained by the peculiar biology of S. gregaria, which involves a density-dependent swarming phase with some exceptional spectacular migrations, rather than a continuous colonization resulting from the continental expansion of open vegetation habitats during more ancient Quaternary glacial climatic episodes.


Assuntos
Genética Populacional , Gafanhotos , África , Animais , Teorema de Bayes , Variação Genética , Gafanhotos/genética , Repetições de Microssatélites/genética
8.
Bioinformatics ; 35(10): 1720-1728, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30321307

RESUMO

MOTIVATION: Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. RESULTS: We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest (RF) methodology of Breiman (2001) applied in a (non-parametric) regression setting. We advocate the derivation of a new RF for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. AVAILABILITY AND IMPLEMENTATION: All methods designed here have been incorporated in the R package abcrf (version 1.7.1) available on CRAN. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Teorema de Bayes , Biometria , Simulação por Computador , Genética Populacional , Humanos , Funções Verossimilhança
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...